An Efficient Two-Pass Decoder for SMT Using Word Confidence Estimation
نویسندگان
چکیده
During decoding, the Statistical Machine Translation (SMT) decoder travels over all complete paths on the Search Graph (SG), seeks those with cheapest costs and backtracks to read off the best translations. Although these winners beat the rest in model scores, there is no certain guarantee that they have the highest quality with respect to the human references. This paper exploits Word Confidence Estimation (WCE) scores in the second pass of decoding to enhance the Machine Translation (MT) quality. By using the confidence score of each word in the N-best list to update the cost of SG hypotheses containing it, we hope to “reinforce” or “weaken” them relied on word quality. After the update, new best translations are re-determined using updated costs. In the experiments on our real WCE scores and ideal (oracle) ones, the latter significantly boosts one-pass decoder by 7.87 BLEU points, meanwhile the former yields an improvement of 1.49 points for the same metric.
منابع مشابه
A Hybrid Machine Translation System Based on a Monotone Decoder
In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...
متن کاملTwo-pass Continuous Digit String Decoder
In this paper, we present a two-pass continuous digit string decoder using two sets of whole-word HMM models. One set contains context-independent (CI) models used in the first-pass search. The first-pass search results in N-best hypotheses from which a N-best word lattice can be derived. The other set contains context-dependent (CD) HMM models used to search along the N-best word lattice for t...
متن کاملEfficient 2-pass n-best decoder
In this paper, we describe the new BBN BYBLOS efcient 2-Pass N-Best decoder used for the 1996 Hub-4 Benchmark Tests. The decoder uses a quick fastmatch to determine the likely word endings. Then in the second pass, it performs a time-synchronous beam search using a detailed continuous-density HMM and a trigram language model to decide the word starting positions. From these word starts, the dec...
متن کاملDependency Treelet Translation: Syntactically Informed Phrasal SMT
We describe a novel approach to statistical machine translation that combines syntactic information in the source language with recent advances in phrasal translation. This method requires a source-language dependency parser, target language word segmentation and an unsupervised word alignment component. We align a parallel corpus, project the source dependency parse onto the target sentence, e...
متن کاملHypergraph Training and Decoding of System Combination in SMT
Tranditional n-best based training and decoding method of system combination can propogate the error because of imprecision parameter estimation and too early prunning. In order to alleviate the problem, the paper proposes hypergraph (HG) based three-pass training and three-pass decoding for different features. In order to construct HG, this paper introduces simplified bracket transduction gram...
متن کامل